This is a data science project I started to convince myself to stop stressing on my commute to work. I asked myself the question, “How much time am I really saving when I rush on my drive to work? What is it costing me to drive faster?” I drive a 2016 prius, it can’t really cost that much to save a bit of time, right? From those initial questions the project grew with more questions, and eventually to be more of practicing data analysis skills and since I had a unique data set that I was intimately familiar with a fun way to learn new skills. This project has turned into a personal data analysis skills portfolio, as I work in a field where many people have open publications, while I have been at commercial firms my entire career and have no demonstration of my skills that I can openly share.
All data was collected by me, and logged into a google drive spreadsheet at the completion of each commute. Anyone interested has my blessing to use the data for practice and learning.
As happens with any data analysis, I realized as time went on that the next set of questions that came to mind required more data. I began collecting additional parameters as I thought of them to answer more fun questions, and to explain some large events that had a significant impact on the results. These included getting new tires and a bridge along my daily commute (The Washington bridge in Providence, RI) closing. Luckily I was able to WFH in the first couple week sof the closure, so luckily/sadly don’t have data from that period.
Follow the rules of the road (with the exception of the speed limit, which is arguable since not interrupting the flow of traffic is a law and safety concern - there’s a weird phenomenon where people from the east coast physically cannot see speed limit signs. I’ve seen cops blending in with traffic going ~30 MPH over the speed limit and not do anything)
Preserve my sanity by driving however is easiest and least stressful
Don’t be a dick and significantly inconvenience other people
Use cruise control as much as possible to stay at the stated speed
While driving on my 39 mile commute on the east coast, I aimed to use cruise control as much as possible to simplify my drive. I drive in the middle lane if it’s 3 lanes, or the right lane if it’s a 2 lane highway. I don’t have adaptive cruise control, so if I came up to someone going slower than me I would pass them. Occasionally people would be passing me at the same time, so I would slow down until they passed, then resume my speed and pass. Occasionally I would have someone come up behind me as I was already passing, so I would speed up to pass faster so they could get by and then I would slow back down.
These rules obviously lead to a fair bit of inherent variability. The goal of this started as stress reduction, so I stuck to those rules and accepted a bit of extra variability in my models to maintain my sanity.
I will intermittently upload the data to this github, but my daily tracker can be accessed here
I’ve added the data manipulations for later parts of the data collection and analysis to this initial data loading and wrangling section for convenience and simpification, this document is intended to read as a logical sequence rather than a chronological reasoning.
I’ve set an arbitrary cost for gas for $/gallon based on my time on the east coast. Might change, might be updated, but for now is a largely irrelevant part of the analysis.
require(googledrive)
require(viridis)
require(ggResidpanel)
require(plotly)
require(tidyverse)
cost <- 3.40
distance <- 39
drive_download("MPG", overwrite = TRUE,
type = "csv")
2
mpg <- read_csv("MPG.csv",
name_repair = "universal") |>
mutate(time_if_max_speed = distance / mph * 60,
effective_mph = distance / (time / 60)) |>
mutate(gals_used = distance / mpg,
cost = gals_used * cost,
traffic = time - time_if_max_speed,
`effective mph` = as.numeric(effective_mph)) |>
filter(!is.na(mpg))
mpg$tires <- mpg$tires |>
replace_na("new")
mpg$bridge <- mpg$bridge |>
replace_na("closed")
mpg$year <- mpg$year |>
replace_na(2024)
mpg$date <- mpg |>
mutate(date_full = paste0(date,year)) |>
pull(date_full) |>
as.Date("%d%b%Y")
Here’s was the first check for what the data looked like, since the first obivous question is what is the effect on my mileage given my speed?
ggplotly(
mpg |>
ggplot(aes(mph,
mpg))+
geom_point() +
geom_smooth(method = "lm",)+
labs(title = "Cruise Control Setting") ,
dynamicTicks = T
)
I had already started tracking when I got new tires, and immediately noticed the drop in mileage. Here’s what it looks like when I break it out by old vs new tires.
ggplotly(
mpg |>
ggplot(aes(mph,
mpg))+
geom_point() +
geom_smooth(method = "lm",)+
facet_wrap(facets = "tires")+
labs(title = "Cruise Control Setting") ,
dynamicTicks = T
)
And another way to look at it:
ggplotly(
mpg |>
ggplot(aes(mph,
mpg,
color = tires))+
geom_point() +
geom_smooth(method = "lm",)+
labs(title = "Cruise Control Setting") ,
dynamicTicks = T
)
In case anyone is not familiar with how tires work and what the results look like, I basically had racing slicks with almost no tread. Rolling resistance is a real thing folks.
ggplotly(
mpg |>
ggplot(aes(mph,
mpg,
color = date)) +
geom_point()+
scale_color_viridis() ,
dynamicTicks = T
)
Hmmm, the old vs new tires have a clear separation, but my gut feels like there’s still a seasonality effect that I’m not correcting for….
Let’s re-arrange the variables a bit.
ggplotly(
mpg |>
ggplot(aes(date,
mpg,
color = mph)) +
geom_point() +
geom_vline(xintercept = (as.Date("2023-05-28"))) +
scale_color_viridis() ,
dynamicTicks = T
)
Seems like there’s a pretty nice clean drop off at the time I changed to new tires, but still a decline after November. It’s interesting because while there’s a decrease in temperatures around where I live starting around ~November that would account for a drop in mileage and efficiency, it’s wasn’t a large change from November to December.
ggplotly(
mpg |>
ggplot(aes(date,
mpg,
color = mph)) +
geom_point() +
geom_vline(xintercept = (as.Date("2023-05-28"))) +
geom_vline(xintercept = (as.Date("2023-11-01"))) +
geom_vline(xintercept = (as.Date("2023-12-13"))) +
scale_color_viridis() ,
dynamicTicks = T
)
Sadly for my mental health, there’s an obvious answer…. the Washignton Bridge closure
Unsurprisingly, some process control charts would have been some useful tools to detect these large shifts in performance.
# TODO add LM model for mpg vs mpg
# plot resid vs date to check for seasonality
#
# mpg |>
# filter(tires == "new") |>
# ggplot(aes(date)